Overview

Dataset statistics

Number of variables25
Number of observations742
Missing cells0
Missing cells (%)0.0%
Duplicate rows275
Duplicate rows (%)37.1%
Total size in memory674.0 KiB
Average record size in memory930.2 B

Variable types

CAT11
BOOL8
NUM6

Reproduction

Analysis started2020-06-19 04:19:48.236762
Analysis finished2020-06-19 04:20:02.085298
Duration13.85 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 275 (37.1%) duplicate rows Duplicates
Job Title has a high cardinality: 264 distinct values High cardinality
Location has a high cardinality: 200 distinct values High cardinality
Headquarters has a high cardinality: 198 distinct values High cardinality
Industry has a high cardinality: 60 distinct values High cardinality
Competitors has a high cardinality: 128 distinct values High cardinality
company_txt has a high cardinality: 343 distinct values High cardinality
max_salary is highly correlated with min_salary and 1 other fieldsHigh correlation
min_salary is highly correlated with max_salary and 1 other fieldsHigh correlation
avg_salary is highly correlated with min_salary and 1 other fieldsHigh correlation
Sector is highly correlated with IndustryHigh correlation
Industry is highly correlated with SectorHigh correlation

Variables

Job Title
Categorical

HIGH CARDINALITY

Distinct count264
Unique (%)35.6%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
Data Scientist
131
Data Engineer
 
53
Senior Data Scientist
 
34
Data Analyst
 
15
Senior Data Engineer
 
14
Other values (259)
495
ValueCountFrequency (%) 
Data Scientist13117.7%
 
Data Engineer537.1%
 
Senior Data Scientist344.6%
 
Data Analyst152.0%
 
Senior Data Engineer141.9%
 
Senior Data Analyst121.6%
 
Lead Data Scientist81.1%
 
Sr. Data Engineer60.8%
 
Marketing Data Analyst60.8%
 
Machine Learning Engineer50.7%
 
Principal Data Scientist50.7%
 
MED TECH/LAB SCIENTIST- SOUTH COASTAL LAB40.5%
 
Research Scientist40.5%
 
Food Scientist - Developer40.5%
 
Staff Scientist-Downstream Process Development40.5%
 
R&D Specialist/ Food Scientist40.5%
 
Sr. Data Engineer - Contract-to-Hire (Java)40.5%
 
Senior Research Scientist-Machine Learning40.5%
 
Medical Laboratory Scientist40.5%
 
Analytics Manager - Data Mart40.5%
 
Senior Data Scientist - R&D Oncology30.4%
 
Research Scientist, Immunology - Cancer Biology30.4%
 
Revenue Analytics Manager30.4%
 
Lead Data Engineer30.4%
 
Senior Scientist (Neuroscience)30.4%
 
Other values (239)40254.2%
 

Length

Max length98
Median length23
Mean length27.94204852
Min length9

Overview of Unicode Properties

Unique unicode characters66
Unique unicode categories (?)9
Unique unicode scripts (?)2
Unique unicode blocks (?)2
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
t211410.2%
 
208110.0%
 
a20049.7%
 
i18528.9%
 
e17448.4%
 
n16057.7%
 
c9404.5%
 
s9384.5%
 
r8574.1%
 
S7993.9%
 
D6613.2%
 
o6413.1%
 
l5322.6%
 
g3741.8%
 
A2901.4%
 
y2871.4%
 
E2611.3%
 
-1840.9%
 
u1750.8%
 
m1680.8%
 
M1510.7%
 
I1500.7%
 
d1390.7%
 
C1360.7%
 
p1230.6%
 
Other values (41)15277.4%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter1491271.9%
 
Uppercase Letter319415.4%
 
Space Separator208110.0%
 
Other Punctuation2281.1%
 
Dash Punctuation1930.9%
 
Open Punctuation420.2%
 
Decimal Number410.2%
 
Close Punctuation410.2%
 
Math Symbol1< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
S79925.0%
 
D66120.7%
 
A2909.1%
 
E2618.2%
 
M1514.7%
 
I1504.7%
 
C1364.3%
 
L1183.7%
 
P1113.5%
 
R1023.2%
 
T902.8%
 
B662.1%
 
O561.8%
 
N481.5%
 
H361.1%
 
G220.7%
 
F220.7%
 
V210.7%
 
U170.5%
 
Q140.4%
 
W70.2%
 
J70.2%
 
Y50.2%
 
K40.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
t211414.2%
 
a200413.4%
 
i185212.4%
 
e174411.7%
 
n160510.8%
 
c9406.3%
 
s9386.3%
 
r8575.7%
 
o6414.3%
 
l5323.6%
 
g3742.5%
 
y2871.9%
 
u1751.2%
 
m1681.1%
 
d1390.9%
 
p1230.8%
 
h1210.8%
 
f1030.7%
 
v760.5%
 
b350.2%
 
k320.2%
 
w260.2%
 
x110.1%
 
j80.1%
 
z7< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
2081100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-18495.3%
 
94.7%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
,10345.2%
 
/6227.2%
 
.3515.4%
 
&2611.4%
 
:20.9%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
21434.1%
 
01229.3%
 
1819.5%
 
437.3%
 
924.9%
 
524.9%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(42100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)41100.0%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
|1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin1810687.3%
 
Common262712.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
t211411.7%
 
a200411.1%
 
i185210.2%
 
e17449.6%
 
n16058.9%
 
c9405.2%
 
s9385.2%
 
r8574.7%
 
S7994.4%
 
D6613.7%
 
o6413.5%
 
l5322.9%
 
g3742.1%
 
A2901.6%
 
y2871.6%
 
E2611.4%
 
u1751.0%
 
m1680.9%
 
M1510.8%
 
I1500.8%
 
d1390.8%
 
C1360.8%
 
p1230.7%
 
h1210.7%
 
L1180.7%
 
Other values (24)9265.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
208179.2%
 
-1847.0%
 
,1033.9%
 
/622.4%
 
(421.6%
 
)411.6%
 
.351.3%
 
&261.0%
 
2140.5%
 
0120.5%
 
90.3%
 
180.3%
 
430.1%
 
920.1%
 
520.1%
 
:20.1%
 
|1< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII20724> 99.9%
 
Punctuation9< 0.1%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
t211410.2%
 
208110.0%
 
a20049.7%
 
i18528.9%
 
e17448.4%
 
n16057.7%
 
c9404.5%
 
s9384.5%
 
r8574.1%
 
S7993.9%
 
D6613.2%
 
o6413.1%
 
l5322.6%
 
g3741.8%
 
A2901.4%
 
y2871.4%
 
E2611.3%
 
-1840.9%
 
u1750.8%
 
m1680.8%
 
M1510.7%
 
I1500.7%
 
d1390.7%
 
C1360.7%
 
p1230.6%
 
Other values (40)15187.3%
 

Most frequent Punctuation characters

ValueCountFrequency (%) 
9100.0%
 

Rating
Real number (ℝ)

Distinct count31
Unique (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.6188679245283017
Minimum-1.0
Maximum5.0
Zeros0
Zeros (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum-1
5-th percentile2.6
Q13.3
median3.7
Q34
95-th percentile4.7
Maximum5
Range6
Interquartile range (IQR)0.7

Descriptive statistics

Standard deviation0.8012101585
Coefficient of variation (CV)0.2213980104
Kurtosis14.30412724
Mean3.618867925
Median Absolute Deviation (MAD)0.35
Skewness-2.814019554
Sum2685.2
Variance0.641937718
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.9638.5%
 
3.8618.2%
 
3.7618.2%
 
3.5496.6%
 
4476.3%
 
3.6466.2%
 
3.4445.9%
 
3.3395.3%
 
3.2354.7%
 
4.4334.4%
 
4.3324.3%
 
4.7314.2%
 
4.2263.5%
 
3.1253.4%
 
4.1192.6%
 
2.9182.4%
 
3172.3%
 
2.7141.9%
 
2.6121.6%
 
-1111.5%
 
4.6101.3%
 
4.891.2%
 
2.870.9%
 
2.470.9%
 
4.570.9%
 
Other values (6)192.6%
 
ValueCountFrequency (%) 
-1111.5%
 
1.930.4%
 
2.150.7%
 
2.220.3%
 
2.320.3%
 
2.470.9%
 
2.520.3%
 
2.6121.6%
 
2.7141.9%
 
2.870.9%
 
ValueCountFrequency (%) 
550.7%
 
4.891.2%
 
4.7314.2%
 
4.6101.3%
 
4.570.9%
 
4.4334.4%
 
4.3324.3%
 
4.2263.5%
 
4.1192.6%
 
4476.3%
 

Location
Categorical

HIGH CARDINALITY

Distinct count200
Unique (%)27.0%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
New York, NY
 
55
San Francisco, CA
 
49
Cambridge, MA
 
47
Chicago, IL
 
32
Boston, MA
 
23
Other values (195)
536
ValueCountFrequency (%) 
New York, NY557.4%
 
San Francisco, CA496.6%
 
Cambridge, MA476.3%
 
Chicago, IL324.3%
 
Boston, MA233.1%
 
San Jose, CA131.8%
 
Pittsburgh, PA121.6%
 
Rockville, MD111.5%
 
Washington, DC111.5%
 
Richland, WA101.3%
 
Winston-Salem, NC101.3%
 
Herndon, VA101.3%
 
Indianapolis, IN91.2%
 
San Diego, CA91.2%
 
South San Francisco, CA81.1%
 
Austin, TX81.1%
 
Mountain View, CA81.1%
 
Palo Alto, CA70.9%
 
Rochester, NY70.9%
 
Phoenix, AZ60.8%
 
Marlborough, MA60.8%
 
Salt Lake City, UT60.8%
 
Huntsville, AL60.8%
 
Gaithersburg, MD60.8%
 
Charlotte, NC60.8%
 
Other values (175)36749.5%
 

Length

Max length33
Median length13
Mean length13.1509434
Min length8

Overview of Unicode Properties

Unique unicode characters54
Unique unicode categories (?)5
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
99210.2%
 
,7437.6%
 
a6076.2%
 
o5335.5%
 
n5295.4%
 
e5145.3%
 
i4935.1%
 
A4384.5%
 
r4244.3%
 
l3483.6%
 
C3363.4%
 
t3203.3%
 
s2762.8%
 
c2282.3%
 
N2112.2%
 
M2022.1%
 
g1781.8%
 
h1721.8%
 
d1581.6%
 
S1531.6%
 
Y1331.4%
 
u1281.3%
 
L1061.1%
 
m1051.1%
 
k1051.1%
 
Other values (29)132613.6%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter552556.6%
 
Uppercase Letter248825.5%
 
Space Separator99210.2%
 
Other Punctuation7437.6%
 
Dash Punctuation100.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A43817.6%
 
C33613.5%
 
N2118.5%
 
M2028.1%
 
S1536.1%
 
Y1335.3%
 
L1064.3%
 
F933.7%
 
I893.6%
 
D873.5%
 
P843.4%
 
W722.9%
 
B612.5%
 
V612.5%
 
T602.4%
 
H582.3%
 
R562.3%
 
O562.3%
 
J351.4%
 
X281.1%
 
G170.7%
 
K150.6%
 
E150.6%
 
U100.4%
 
Z90.4%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a60711.0%
 
o5339.6%
 
n5299.6%
 
e5149.3%
 
i4938.9%
 
r4247.7%
 
l3486.3%
 
t3205.8%
 
s2765.0%
 
c2284.1%
 
g1783.2%
 
h1723.1%
 
d1582.9%
 
u1282.3%
 
m1051.9%
 
k1051.9%
 
b1021.8%
 
w911.6%
 
v681.2%
 
p531.0%
 
y430.8%
 
f240.4%
 
x180.3%
 
q60.1%
 
j2< 0.1%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
,743100.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
992100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-10100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin801382.1%
 
Common174517.9%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a6077.6%
 
o5336.7%
 
n5296.6%
 
e5146.4%
 
i4936.2%
 
A4385.5%
 
r4245.3%
 
l3484.3%
 
C3364.2%
 
t3204.0%
 
s2763.4%
 
c2282.8%
 
N2112.6%
 
M2022.5%
 
g1782.2%
 
h1722.1%
 
d1582.0%
 
S1531.9%
 
Y1331.7%
 
u1281.6%
 
L1061.3%
 
m1051.3%
 
k1051.3%
 
b1021.3%
 
F931.2%
 
Other values (26)112114.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
99256.8%
 
,74342.6%
 
-100.6%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII9758100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
99210.2%
 
,7437.6%
 
a6076.2%
 
o5335.5%
 
n5295.4%
 
e5145.3%
 
i4935.1%
 
A4384.5%
 
r4244.3%
 
l3483.6%
 
C3363.4%
 
t3203.3%
 
s2762.8%
 
c2282.3%
 
N2112.2%
 
M2022.1%
 
g1781.8%
 
h1721.8%
 
d1581.6%
 
S1531.6%
 
Y1331.4%
 
u1281.3%
 
L1061.1%
 
m1051.1%
 
k1051.1%
 
Other values (29)132613.6%
 

Headquarters
Categorical

HIGH CARDINALITY

Distinct count198
Unique (%)26.7%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
New York, NY
 
52
San Francisco, CA
 
42
Chicago, IL
 
30
Cambridge, MA
 
20
Winston-Salem, NC
 
14
Other values (193)
584
ValueCountFrequency (%) 
New York, NY527.0%
 
San Francisco, CA425.7%
 
Chicago, IL304.0%
 
Cambridge, MA202.7%
 
Winston-Salem, NC141.9%
 
Boston, MA141.9%
 
OSAKA, Japan141.9%
 
Springfield, MA141.9%
 
Reston, VA121.6%
 
Richland, WA121.6%
 
Pittsburgh, PA111.5%
 
Mountain View, CA111.5%
 
Palo Alto, CA101.3%
 
Cambridge, United Kingdom91.2%
 
Salt Lake City, UT91.2%
 
Washington, DC91.2%
 
Denver, CO81.1%
 
San Jose, CA81.1%
 
Rockville, MD81.1%
 
Bedford, MA81.1%
 
Herndon, VA81.1%
 
San Rafael, CA70.9%
 
Chadds Ford, PA70.9%
 
Basel, Switzerland70.9%
 
Rochester, NY70.9%
 
Other values (173)39152.7%
 

Length

Max length26
Median length13
Mean length13.606469
Min length2

Overview of Unicode Properties

Unique unicode characters55
Unique unicode categories (?)6
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
103310.2%
 
,7417.3%
 
a6476.4%
 
n6156.1%
 
e5605.5%
 
o5285.2%
 
i5105.1%
 
A4594.5%
 
r4224.2%
 
l3703.7%
 
C3503.5%
 
t3483.4%
 
s2592.6%
 
d2372.3%
 
c2192.2%
 
N1921.9%
 
S1811.8%
 
M1781.8%
 
g1781.8%
 
h1591.6%
 
Y1151.1%
 
m1141.1%
 
u1111.1%
 
w1101.1%
 
L1071.1%
 
Other values (30)135313.4%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter577957.2%
 
Uppercase Letter252525.0%
 
Space Separator103310.2%
 
Other Punctuation7417.3%
 
Dash Punctuation170.2%
 
Decimal Number1< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A45918.2%
 
C35013.9%
 
N1927.6%
 
S1817.2%
 
M1787.0%
 
Y1154.6%
 
L1074.2%
 
F1024.0%
 
I823.2%
 
P803.2%
 
V793.1%
 
W753.0%
 
D732.9%
 
B712.8%
 
R712.8%
 
O582.3%
 
T502.0%
 
J401.6%
 
K391.5%
 
H321.3%
 
U281.1%
 
X210.8%
 
G180.7%
 
E160.6%
 
Z50.2%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a64711.2%
 
n61510.6%
 
e5609.7%
 
o5289.1%
 
i5108.8%
 
r4227.3%
 
l3706.4%
 
t3486.0%
 
s2594.5%
 
d2374.1%
 
c2193.8%
 
g1783.1%
 
h1592.8%
 
m1142.0%
 
u1111.9%
 
w1101.9%
 
k1021.8%
 
b661.1%
 
p591.0%
 
y520.9%
 
v460.8%
 
f420.7%
 
x140.2%
 
z100.2%
 
j1< 0.1%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
,741100.0%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
1033100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-17100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
11100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin830482.3%
 
Common179217.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a6477.8%
 
n6157.4%
 
e5606.7%
 
o5286.4%
 
i5106.1%
 
A4595.5%
 
r4225.1%
 
l3704.5%
 
C3504.2%
 
t3484.2%
 
s2593.1%
 
d2372.9%
 
c2192.6%
 
N1922.3%
 
S1812.2%
 
M1782.1%
 
g1782.1%
 
h1591.9%
 
Y1151.4%
 
m1141.4%
 
u1111.3%
 
w1101.3%
 
L1071.3%
 
F1021.2%
 
k1021.2%
 
Other values (26)113113.6%
 

Most frequent Common characters

ValueCountFrequency (%) 
103357.6%
 
,74141.4%
 
-170.9%
 
110.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII10096100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
103310.2%
 
,7417.3%
 
a6476.4%
 
n6156.1%
 
e5605.5%
 
o5285.2%
 
i5105.1%
 
A4594.5%
 
r4224.2%
 
l3703.7%
 
C3503.5%
 
t3483.4%
 
s2592.6%
 
d2372.3%
 
c2192.2%
 
N1921.9%
 
S1811.8%
 
M1781.8%
 
g1781.8%
 
h1591.6%
 
Y1151.1%
 
m1141.1%
 
u1111.1%
 
w1101.1%
 
L1071.1%
 
Other values (30)135313.4%
 

Size
Categorical

Distinct count9
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
1001 to 5000 employees
150
501 to 1000 employees
134
10000+ employees
130
201 to 500 employees
117
51 to 200 employees
94
Other values (4)
117
ValueCountFrequency (%) 
1001 to 5000 employees15020.2%
 
501 to 1000 employees13418.1%
 
10000+ employees13017.5%
 
201 to 500 employees11715.8%
 
51 to 200 employees9412.7%
 
5001 to 10000 employees7610.2%
 
1 to 50 employees314.2%
 
Unknown91.2%
 
-110.1%
 

Length

Max length23
Median length20
Mean length19.7574124
Min length2

Overview of Unicode Properties

Unique unicode characters19
Unique unicode categories (?)6
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
0283219.3%
 
e219615.0%
 
193613.2%
 
o13439.2%
 
110937.5%
 
m7325.0%
 
p7325.0%
 
l7325.0%
 
y7325.0%
 
s7325.0%
 
56024.1%
 
t6024.1%
 
22111.4%
 
+1300.9%
 
n270.2%
 
U90.1%
 
k90.1%
 
w90.1%
 
-1< 0.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter784653.5%
 
Decimal Number473832.3%
 
Space Separator193613.2%
 
Math Symbol1300.9%
 
Uppercase Letter90.1%
 
Dash Punctuation1< 0.1%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0283259.8%
 
1109323.1%
 
560212.7%
 
22114.5%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
1936100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e219628.0%
 
o134317.1%
 
m7329.3%
 
p7329.3%
 
l7329.3%
 
y7329.3%
 
s7329.3%
 
t6027.7%
 
n270.3%
 
k90.1%
 
w90.1%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+130100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
U9100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-1100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin785553.6%
 
Common680546.4%
 

Most frequent Common characters

ValueCountFrequency (%) 
0283241.6%
 
193628.4%
 
1109316.1%
 
56028.8%
 
22113.1%
 
+1301.9%
 
-1< 0.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e219628.0%
 
o134317.1%
 
m7329.3%
 
p7329.3%
 
l7329.3%
 
y7329.3%
 
s7329.3%
 
t6027.7%
 
n270.3%
 
U90.1%
 
k90.1%
 
w90.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII14660100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
0283219.3%
 
e219615.0%
 
193613.2%
 
o13439.2%
 
110937.5%
 
m7325.0%
 
p7325.0%
 
l7325.0%
 
y7325.0%
 
s7325.0%
 
56024.1%
 
t6024.1%
 
22111.4%
 
+1300.9%
 
n270.2%
 
U90.1%
 
k90.1%
 
w90.1%
 
-1< 0.1%
 

Founded
Real number (ℝ)

Distinct count102
Unique (%)13.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1837.154986522911
Minimum-1
Maximum2019
Zeros0
Zeros (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum-1
5-th percentile-1
Q11939
median1988
Q32007
95-th percentile2014
Maximum2019
Range2020
Interquartile range (IQR)68

Descriptive statistics

Standard deviation497.1837627
Coefficient of variation (CV)0.270627011
Kurtosis9.705374859
Mean1837.154987
Median Absolute Deviation (MAD)22
Skewness-3.394532023
Sum1363169
Variance247191.6939
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-1506.7%
 
2010324.3%
 
2008314.2%
 
1996273.6%
 
2006243.2%
 
2012212.8%
 
2011192.6%
 
1958182.4%
 
2007182.4%
 
2002182.4%
 
1984182.4%
 
2015162.2%
 
2013152.0%
 
1851141.9%
 
1875141.9%
 
1781141.9%
 
1997141.9%
 
2014131.8%
 
1999121.6%
 
1965121.6%
 
2017121.6%
 
2000101.3%
 
2003101.3%
 
2005101.3%
 
1935101.3%
 
Other values (77)29039.1%
 
ValueCountFrequency (%) 
-1506.7%
 
174410.1%
 
1781141.9%
 
181210.1%
 
183040.5%
 
184620.3%
 
184970.9%
 
185010.1%
 
1851141.9%
 
185250.7%
 
ValueCountFrequency (%) 
201920.3%
 
2017121.6%
 
201650.7%
 
2015162.2%
 
2014131.8%
 
2013152.0%
 
2012212.8%
 
2011192.6%
 
2010324.3%
 
200960.8%
 
Distinct count11
Unique (%)1.5%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
Company - Private
410
Company - Public
193
Nonprofit Organization
 
55
Subsidiary or Business Segment
 
34
Government
 
15
Other values (6)
 
35
ValueCountFrequency (%) 
Company - Private41055.3%
 
Company - Public19326.0%
 
Nonprofit Organization557.4%
 
Subsidiary or Business Segment344.6%
 
Government152.0%
 
Hospital152.0%
 
College / University131.8%
 
Other Organization30.4%
 
School / School District20.3%
 
-110.1%
 
Unknown10.1%
 

Length

Max length30
Median length17
Mean length17.4245283
Min length2

Overview of Unicode Properties

Unique unicode characters37
Unique unicode categories (?)6
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
139810.8%
 
a11789.1%
 
i9217.1%
 
n8886.9%
 
o8576.6%
 
p6735.2%
 
m6525.0%
 
y6505.0%
 
r6244.8%
 
C6164.8%
 
t6074.7%
 
-6044.7%
 
P6034.7%
 
e5844.5%
 
v4383.4%
 
u2612.0%
 
l2381.8%
 
b2271.8%
 
c1991.5%
 
s1661.3%
 
g1050.8%
 
S720.6%
 
O610.5%
 
z580.4%
 
N550.4%
 
Other values (12)1941.5%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter942472.9%
 
Uppercase Letter148711.5%
 
Space Separator139810.8%
 
Dash Punctuation6044.7%
 
Other Punctuation150.1%
 
Decimal Number1< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
C61641.4%
 
P60340.6%
 
S724.8%
 
O614.1%
 
N553.7%
 
B342.3%
 
G151.0%
 
H151.0%
 
U140.9%
 
D20.1%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a117812.5%
 
i9219.8%
 
n8889.4%
 
o8579.1%
 
p6737.1%
 
m6526.9%
 
y6506.9%
 
r6246.6%
 
t6076.4%
 
e5846.2%
 
v4384.6%
 
u2612.8%
 
l2382.5%
 
b2272.4%
 
c1992.1%
 
s1661.8%
 
g1051.1%
 
z580.6%
 
f550.6%
 
d340.4%
 
h70.1%
 
k1< 0.1%
 
w1< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
1398100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-604100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/15100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
11100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin1091184.4%
 
Common201815.6%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a117810.8%
 
i9218.4%
 
n8888.1%
 
o8577.9%
 
p6736.2%
 
m6526.0%
 
y6506.0%
 
r6245.7%
 
C6165.6%
 
t6075.6%
 
P6035.5%
 
e5845.4%
 
v4384.0%
 
u2612.4%
 
l2382.2%
 
b2272.1%
 
c1991.8%
 
s1661.5%
 
g1051.0%
 
S720.7%
 
O610.6%
 
z580.5%
 
N550.5%
 
f550.5%
 
d340.3%
 
Other values (8)890.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
139869.3%
 
-60429.9%
 
/150.7%
 
11< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII12929100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
139810.8%
 
a11789.1%
 
i9217.1%
 
n8886.9%
 
o8576.6%
 
p6735.2%
 
m6525.0%
 
y6505.0%
 
r6244.8%
 
C6164.8%
 
t6074.7%
 
-6044.7%
 
P6034.7%
 
e5844.5%
 
v4383.4%
 
u2612.0%
 
l2381.8%
 
b2271.8%
 
c1991.5%
 
s1661.3%
 
g1050.8%
 
S720.6%
 
O610.5%
 
z580.4%
 
N550.4%
 
Other values (12)1941.5%
 

Industry
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct count60
Unique (%)8.1%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
Biotech & Pharmaceuticals
112
Insurance Carriers
 
63
Computer Hardware & Software
 
59
IT Services
 
50
Health Care Services & Hospitals
 
49
Other values (55)
409
ValueCountFrequency (%) 
Biotech & Pharmaceuticals11215.1%
 
Insurance Carriers638.5%
 
Computer Hardware & Software598.0%
 
IT Services506.7%
 
Health Care Services & Hospitals496.6%
 
Enterprise Software & Network Solutions425.7%
 
Internet293.9%
 
Consulting293.9%
 
Advertising & Marketing253.4%
 
Aerospace & Defense253.4%
 
Consumer Products Manufacturing202.7%
 
Research & Development192.6%
 
Colleges & Universities162.2%
 
Energy141.9%
 
Banks & Credit Unions121.6%
 
Federal Agencies111.5%
 
Staffing & Outsourcing101.3%
 
-1101.3%
 
Real Estate81.1%
 
Travel Agencies81.1%
 
Food & Beverage Manufacturing81.1%
 
Lending81.1%
 
Financial Analytics & Research81.1%
 
Security Services70.9%
 
Department, Clothing, & Shoe Stores60.8%
 
Other values (35)9412.7%
 

Length

Max length40
Median length23
Mean length21.9083558
Min length2

Overview of Unicode Properties

Unique unicode characters52
Unique unicode categories (?)6
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e178411.0%
 
14068.6%
 
r13408.2%
 
a12187.5%
 
t10486.4%
 
i9826.0%
 
s9696.0%
 
n8485.2%
 
c7634.7%
 
o7464.6%
 
u5013.1%
 
l4352.7%
 
&4162.6%
 
h3292.0%
 
S3131.9%
 
g2761.7%
 
C2671.6%
 
m2531.6%
 
p2171.3%
 
w2041.3%
 
v2011.2%
 
f1821.1%
 
d1661.0%
 
H1591.0%
 
I1571.0%
 
Other values (27)10766.6%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter1261477.6%
 
Uppercase Letter177410.9%
 
Space Separator14068.6%
 
Other Punctuation4302.6%
 
Decimal Number180.1%
 
Dash Punctuation140.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
S31317.6%
 
C26715.1%
 
H1599.0%
 
I1578.9%
 
B1518.5%
 
P1438.1%
 
A985.5%
 
E794.5%
 
T784.4%
 
M734.1%
 
D512.9%
 
R452.5%
 
N442.5%
 
F331.9%
 
U281.6%
 
O171.0%
 
G140.8%
 
L120.7%
 
V50.3%
 
K40.2%
 
W30.2%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e178414.1%
 
r134010.6%
 
a12189.7%
 
t10488.3%
 
i9827.8%
 
s9697.7%
 
n8486.7%
 
c7636.0%
 
o7465.9%
 
u5014.0%
 
l4353.4%
 
h3292.6%
 
g2762.2%
 
m2532.0%
 
p2171.7%
 
w2041.6%
 
v2011.6%
 
f1821.4%
 
d1661.3%
 
k1000.8%
 
y350.3%
 
b70.1%
 
z6< 0.1%
 
x3< 0.1%
 
q1< 0.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
1406100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
&41696.7%
 
,143.3%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-14100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
11477.8%
 
2422.2%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin1438888.5%
 
Common186811.5%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e178412.4%
 
r13409.3%
 
a12188.5%
 
t10487.3%
 
i9826.8%
 
s9696.7%
 
n8485.9%
 
c7635.3%
 
o7465.2%
 
u5013.5%
 
l4353.0%
 
h3292.3%
 
S3132.2%
 
g2761.9%
 
C2671.9%
 
m2531.8%
 
p2171.5%
 
w2041.4%
 
v2011.4%
 
f1821.3%
 
d1661.2%
 
H1591.1%
 
I1571.1%
 
B1511.0%
 
P1431.0%
 
Other values (21)7365.1%
 

Most frequent Common characters

ValueCountFrequency (%) 
140675.3%
 
&41622.3%
 
,140.7%
 
-140.7%
 
1140.7%
 
240.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII16256100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e178411.0%
 
14068.6%
 
r13408.2%
 
a12187.5%
 
t10486.4%
 
i9826.0%
 
s9696.0%
 
n8485.2%
 
c7634.7%
 
o7464.6%
 
u5013.1%
 
l4352.7%
 
&4162.6%
 
h3292.0%
 
S3131.9%
 
g2761.7%
 
C2671.6%
 
m2531.6%
 
p2171.3%
 
w2041.3%
 
v2011.2%
 
f1821.1%
 
d1661.0%
 
H1591.0%
 
I1571.0%
 
Other values (27)10766.6%
 

Sector
Categorical

HIGH CORRELATION

Distinct count25
Unique (%)3.4%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
Information Technology
180
Biotech & Pharmaceuticals
112
Business Services
97
Insurance
69
Health Care
49
Other values (20)
235
ValueCountFrequency (%) 
Information Technology18024.3%
 
Biotech & Pharmaceuticals11215.1%
 
Business Services9713.1%
 
Insurance699.3%
 
Health Care496.6%
 
Finance425.7%
 
Manufacturing344.6%
 
Aerospace & Defense253.4%
 
Education233.1%
 
Retail152.0%
 
Oil, Gas, Energy & Utilities141.9%
 
Government111.5%
 
-1101.3%
 
Non-Profit91.2%
 
Real Estate81.1%
 
Transportation & Logistics81.1%
 
Travel & Tourism81.1%
 
Media60.8%
 
Telecommunications60.8%
 
Arts, Entertainment & Recreation40.5%
 
Consumer Services40.5%
 
Mining & Metals30.4%
 
Construction, Repair & Maintenance30.4%
 
Accounting & Legal10.1%
 
Agriculture & Forestry10.1%
 

Length

Max length34
Median length17
Mean length17.02695418
Min length2

Overview of Unicode Properties

Unique unicode characters42
Unique unicode categories (?)6
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e11799.3%
 
n10918.6%
 
o9697.7%
 
a9437.5%
 
i8566.8%
 
c8436.7%
 
7315.8%
 
s7125.6%
 
r6625.2%
 
t6545.2%
 
h4533.6%
 
l4113.3%
 
u3933.1%
 
m3312.6%
 
I2492.0%
 
f2482.0%
 
g2421.9%
 
T2101.7%
 
B2091.7%
 
y1951.5%
 
&1791.4%
 
P1211.0%
 
v1200.9%
 
S1010.8%
 
C560.4%
 
Other values (17)4763.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter1036782.1%
 
Uppercase Letter129310.2%
 
Space Separator7315.8%
 
Other Punctuation2141.7%
 
Dash Punctuation190.2%
 
Decimal Number100.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
I24919.3%
 
T21016.2%
 
B20916.2%
 
P1219.4%
 
S1017.8%
 
C564.3%
 
H493.8%
 
E493.8%
 
M493.8%
 
F433.3%
 
A312.4%
 
R302.3%
 
D251.9%
 
G251.9%
 
O141.1%
 
U141.1%
 
L90.7%
 
N90.7%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e117911.4%
 
n109110.5%
 
o9699.3%
 
a9439.1%
 
i8568.3%
 
c8438.1%
 
s7126.9%
 
r6626.4%
 
t6546.3%
 
h4534.4%
 
l4114.0%
 
u3933.8%
 
m3313.2%
 
f2482.4%
 
g2422.3%
 
y1951.9%
 
v1201.2%
 
p360.3%
 
d290.3%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
731100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
&17983.6%
 
,3516.4%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-19100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
110100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin1166092.3%
 
Common9747.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e117910.1%
 
n10919.4%
 
o9698.3%
 
a9438.1%
 
i8567.3%
 
c8437.2%
 
s7126.1%
 
r6625.7%
 
t6545.6%
 
h4533.9%
 
l4113.5%
 
u3933.4%
 
m3312.8%
 
I2492.1%
 
f2482.1%
 
g2422.1%
 
T2101.8%
 
B2091.8%
 
y1951.7%
 
P1211.0%
 
v1201.0%
 
S1010.9%
 
C560.5%
 
H490.4%
 
E490.4%
 
Other values (12)3142.7%
 

Most frequent Common characters

ValueCountFrequency (%) 
73175.1%
 
&17918.4%
 
,353.6%
 
-192.0%
 
1101.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII12634100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e11799.3%
 
n10918.6%
 
o9697.7%
 
a9437.5%
 
i8566.8%
 
c8436.7%
 
7315.8%
 
s7125.6%
 
r6625.2%
 
t6545.2%
 
h4533.6%
 
l4113.3%
 
u3933.1%
 
m3312.6%
 
I2492.0%
 
f2482.0%
 
g2421.9%
 
T2101.7%
 
B2091.7%
 
y1951.5%
 
&1791.4%
 
P1211.0%
 
v1200.9%
 
S1010.8%
 
C560.4%
 
Other values (17)4763.8%
 

Revenue
Categorical

Distinct count14
Unique (%)1.9%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
Unknown / Non-Applicable
203
$10+ billion (USD)
124
$100 to $500 million (USD)
91
$1 to $2 billion (USD)
60
$500 million to $1 billion (USD)
57
Other values (9)
207
ValueCountFrequency (%) 
Unknown / Non-Applicable20327.4%
 
$10+ billion (USD)12416.7%
 
$100 to $500 million (USD)9112.3%
 
$1 to $2 billion (USD)608.1%
 
$500 million to $1 billion (USD)577.7%
 
$50 to $100 million (USD)466.2%
 
$25 to $50 million (USD)405.4%
 
$2 to $5 billion (USD)395.3%
 
$10 to $25 million (USD)324.3%
 
$5 to $10 billion (USD)192.6%
 
$5 to $10 million (USD)182.4%
 
$1 to $5 million (USD)81.1%
 
Less than $1 million (USD)40.5%
 
-110.1%
 

Length

Max length32
Median length24
Mean length23.56199461
Min length2

Overview of Unicode Properties

Unique unicode characters32
Unique unicode categories (?)10
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
236713.5%
 
l15969.1%
 
o14118.1%
 
n14118.1%
 
i13938.0%
 
$9485.4%
 
08494.9%
 
U7414.2%
 
(5383.1%
 
S5383.1%
 
D5383.1%
 
)5383.1%
 
b5022.9%
 
14602.6%
 
t4142.4%
 
p4062.3%
 
53902.2%
 
m2961.7%
 
a2071.2%
 
e2071.2%
 
-2041.2%
 
k2031.2%
 
w2031.2%
 
/2031.2%
 
N2031.2%
 
Other values (7)7174.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter846448.4%
 
Space Separator236713.5%
 
Uppercase Letter222712.7%
 
Decimal Number187010.7%
 
Currency Symbol9485.4%
 
Open Punctuation5383.1%
 
Close Punctuation5383.1%
 
Dash Punctuation2041.2%
 
Other Punctuation2031.2%
 
Math Symbol1240.7%
 

Most frequent Currency Symbol characters

ValueCountFrequency (%) 
$948100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
084945.4%
 
146024.6%
 
539020.9%
 
21719.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
2367100.0%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
l159618.9%
 
o141116.7%
 
n141116.7%
 
i139316.5%
 
b5025.9%
 
t4144.9%
 
p4064.8%
 
m2963.5%
 
a2072.4%
 
e2072.4%
 
k2032.4%
 
w2032.4%
 
c2032.4%
 
s80.1%
 
h4< 0.1%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(538100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
U74133.3%
 
S53824.2%
 
D53824.2%
 
N2039.1%
 
A2039.1%
 
L40.2%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)538100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
/203100.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-204100.0%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+124100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin1069161.2%
 
Common679238.8%
 

Most frequent Common characters

ValueCountFrequency (%) 
236734.8%
 
$94814.0%
 
084912.5%
 
(5387.9%
 
)5387.9%
 
14606.8%
 
53905.7%
 
-2043.0%
 
/2033.0%
 
21712.5%
 
+1241.8%
 

Most frequent Latin characters

ValueCountFrequency (%) 
l159614.9%
 
o141113.2%
 
n141113.2%
 
i139313.0%
 
U7416.9%
 
S5385.0%
 
D5385.0%
 
b5024.7%
 
t4143.9%
 
p4063.8%
 
m2962.8%
 
a2071.9%
 
e2071.9%
 
k2031.9%
 
w2031.9%
 
N2031.9%
 
A2031.9%
 
c2031.9%
 
s80.1%
 
L4< 0.1%
 
h4< 0.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII17483100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
236713.5%
 
l15969.1%
 
o14118.1%
 
n14118.1%
 
i13938.0%
 
$9485.4%
 
08494.9%
 
U7414.2%
 
(5383.1%
 
S5383.1%
 
D5383.1%
 
)5383.1%
 
b5022.9%
 
14602.6%
 
t4142.4%
 
p4062.3%
 
53902.2%
 
m2961.7%
 
a2071.2%
 
e2071.2%
 
-2041.2%
 
k2031.2%
 
w2031.2%
 
/2031.2%
 
N2031.2%
 
Other values (7)7174.1%
 

Competitors
Categorical

HIGH CARDINALITY

Distinct count128
Unique (%)17.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
-1
460
Novartis, Baxter, Pfizer
 
14
Oak Ridge National Laboratory, National Renewable Energy Lab, Los Alamos National Laboratory
 
12
Travelers, Allstate, State Farm
 
10
Roche, GlaxoSmithKline, Novartis
 
9
Other values (123)
237
ValueCountFrequency (%) 
-146062.0%
 
Novartis, Baxter, Pfizer141.9%
 
Oak Ridge National Laboratory, National Renewable Energy Lab, Los Alamos National Laboratory121.6%
 
Travelers, Allstate, State Farm101.3%
 
Roche, GlaxoSmithKline, Novartis91.2%
 
Battelle, General Atomics, SAIC81.1%
 
Expedia Group, Orbitz Worldwide, Priceline.com70.9%
 
Pitney Bowes60.8%
 
Leidos, CACI International, Booz Allen Hamilton60.8%
 
FLURRY, Chartboost60.8%
 
Shire, GlaxoSmithKline, Allergan50.7%
 
Pfizer, AstraZeneca, Merck40.5%
 
TravelCenters of America, Love's Travel Stops & Country Stores, Wawa40.5%
 
UDR, AvalonBay Communities, Essex Property Trust30.4%
 
Munich Re, Hannover RE, SCOR30.4%
 
Tata Consultancy Services, Accenture, Cognizant Technology Solutions30.4%
 
See Tickets, TicketWeb, Vendini30.4%
 
Los Alamos National Laboratory, Battelle, SRI International30.4%
 
American Express, Mastercard, Discover30.4%
 
BioMarin Pharmaceutical, Sangamo Therapeutics, bluebird bio30.4%
 
Seagate Technology, Toshiba30.4%
 
Credit Karma, LendUp, SoFi30.4%
 
ManTech, Booz Allen Hamilton, Leidos30.4%
 
Nationstar Mortgage, Caliber Funding, Quicken Loans30.4%
 
C.H. Robinson, Total Quality Logistics, Coyote Logistics30.4%
 
Other values (103)15520.9%
 

Length

Max length92
Median length2
Mean length15.97439353
Min length2

Overview of Unicode Properties

Unique unicode characters63
Unique unicode categories (?)9
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
10649.0%
 
e9407.9%
 
a8226.9%
 
o6725.7%
 
t6545.5%
 
r6345.3%
 
i6265.3%
 
n5574.7%
 
,5004.2%
 
l4904.1%
 
-4673.9%
 
14603.9%
 
s4003.4%
 
c2892.4%
 
m2011.7%
 
h1811.5%
 
u1601.3%
 
d1591.3%
 
A1561.3%
 
S1561.3%
 
C1441.2%
 
y1431.2%
 
b1221.0%
 
g1201.0%
 
N1181.0%
 
Other values (38)161813.7%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter762764.3%
 
Uppercase Letter168214.2%
 
Space Separator10649.0%
 
Other Punctuation5414.6%
 
Dash Punctuation4673.9%
 
Decimal Number4623.9%
 
Math Symbol4< 0.1%
 
Open Punctuation3< 0.1%
 
Close Punctuation3< 0.1%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-467100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
146099.6%
 
920.4%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A1569.3%
 
S1569.3%
 
C1448.6%
 
N1187.0%
 
L1136.7%
 
T995.9%
 
R945.6%
 
B865.1%
 
I865.1%
 
M744.4%
 
P704.2%
 
G704.2%
 
H663.9%
 
E623.7%
 
F533.2%
 
O503.0%
 
D462.7%
 
U332.0%
 
K311.8%
 
W261.5%
 
Y140.8%
 
Z90.5%
 
Q80.5%
 
V70.4%
 
J60.4%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e94012.3%
 
a82210.8%
 
o6728.8%
 
t6548.6%
 
r6348.3%
 
i6268.2%
 
n5577.3%
 
l4906.4%
 
s4005.2%
 
c2893.8%
 
m2012.6%
 
h1812.4%
 
u1602.1%
 
d1592.1%
 
y1431.9%
 
b1221.6%
 
g1201.6%
 
v861.1%
 
p751.0%
 
k680.9%
 
f630.8%
 
z570.7%
 
x530.7%
 
w450.6%
 
q100.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
1064100.0%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
,50092.4%
 
.193.5%
 
&132.4%
 
'91.7%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
+250.0%
 
|250.0%
 

Most frequent Open Punctuation characters

ValueCountFrequency (%) 
(3100.0%
 

Most frequent Close Punctuation characters

ValueCountFrequency (%) 
)3100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin930978.5%
 
Common254421.5%
 

Most frequent Common characters

ValueCountFrequency (%) 
106441.8%
 
,50019.7%
 
-46718.4%
 
146018.1%
 
.190.7%
 
&130.5%
 
'90.4%
 
(30.1%
 
)30.1%
 
+20.1%
 
920.1%
 
|20.1%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e94010.1%
 
a8228.8%
 
o6727.2%
 
t6547.0%
 
r6346.8%
 
i6266.7%
 
n5576.0%
 
l4905.3%
 
s4004.3%
 
c2893.1%
 
m2012.2%
 
h1811.9%
 
u1601.7%
 
d1591.7%
 
A1561.7%
 
S1561.7%
 
C1441.5%
 
y1431.5%
 
b1221.3%
 
g1201.3%
 
N1181.3%
 
L1131.2%
 
T991.1%
 
R941.0%
 
v860.9%
 
Other values (26)117312.6%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII11853100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
10649.0%
 
e9407.9%
 
a8226.9%
 
o6725.7%
 
t6545.5%
 
r6345.3%
 
i6265.3%
 
n5574.7%
 
,5004.2%
 
l4904.1%
 
-4673.9%
 
14603.9%
 
s4003.4%
 
c2892.4%
 
m2011.7%
 
h1811.5%
 
u1601.3%
 
d1591.3%
 
A1561.3%
 
S1561.3%
 
C1441.2%
 
y1431.2%
 
b1221.0%
 
g1201.0%
 
N1181.0%
 
Other values (38)161813.7%
 

hourly
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0
718
1
 
24
ValueCountFrequency (%) 
071896.8%
 
1243.2%
 
Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0
725
1
 
17
ValueCountFrequency (%) 
072597.7%
 
1172.3%
 

min_salary
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count119
Unique (%)16.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74.06873315363882
Minimum10
Maximum202
Zeros0
Zeros (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum10
5-th percentile32
Q152
median69.5
Q391
95-th percentile127
Maximum202
Range192
Interquartile range (IQR)39

Descriptive statistics

Standard deviation31.86928207
Coefficient of variation (CV)0.4302663312
Kurtosis1.728415099
Mean74.06873315
Median Absolute Deviation (MAD)19.5
Skewness0.9632862889
Sum54959
Variance1015.65114
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
65202.7%
 
80182.4%
 
61182.4%
 
81172.3%
 
63162.2%
 
74162.2%
 
86152.0%
 
56152.0%
 
52152.0%
 
49152.0%
 
60152.0%
 
43141.9%
 
54141.9%
 
42141.9%
 
71131.8%
 
44121.6%
 
68111.5%
 
100111.5%
 
110111.5%
 
50101.3%
 
83101.3%
 
39101.3%
 
76101.3%
 
75101.3%
 
64101.3%
 
Other values (94)40254.2%
 
ValueCountFrequency (%) 
1020.3%
 
1520.3%
 
1720.3%
 
1840.5%
 
2010.1%
 
2181.1%
 
2430.4%
 
2530.4%
 
2610.1%
 
2730.4%
 
ValueCountFrequency (%) 
20230.4%
 
20030.4%
 
19030.4%
 
17610.1%
 
17110.1%
 
15820.3%
 
15070.9%
 
13930.4%
 
13830.4%
 
13610.1%
 

max_salary
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count163
Unique (%)22.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean127.18328840970351
Minimum16
Maximum306
Zeros0
Zeros (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum16
5-th percentile59.1
Q196
median124
Q3155
95-th percentile208
Maximum306
Range290
Interquartile range (IQR)59

Descriptive statistics

Standard deviation46.90900648
Coefficient of variation (CV)0.3688299545
Kurtosis0.6089039058
Mean127.1832884
Median Absolute Deviation (MAD)29
Skewness0.4313542674
Sum94370
Variance2200.454889
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
140162.2%
 
119152.0%
 
124152.0%
 
110152.0%
 
127131.8%
 
113131.8%
 
86121.6%
 
101121.6%
 
173121.6%
 
85111.5%
 
139111.5%
 
62101.3%
 
142101.3%
 
160101.3%
 
134101.3%
 
11291.2%
 
12391.2%
 
13391.2%
 
9991.2%
 
9791.2%
 
14981.1%
 
14381.1%
 
10581.1%
 
12981.1%
 
13281.1%
 
Other values (138)47263.6%
 
ValueCountFrequency (%) 
1610.1%
 
1720.3%
 
2420.3%
 
2550.7%
 
2830.4%
 
2920.3%
 
3460.8%
 
3940.5%
 
4710.1%
 
4820.3%
 
ValueCountFrequency (%) 
30630.4%
 
28910.1%
 
27510.1%
 
27210.1%
 
25020.3%
 
23920.3%
 
23820.3%
 
23110.1%
 
22820.3%
 
22430.4%
 

avg_salary
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count225
Unique (%)30.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100.62601078167116
Minimum13.5
Maximum254.0
Zeros0
Zeros (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum13.5
5-th percentile45.575
Q173.5
median97.5
Q3122.5
95-th percentile167.5
Maximum254
Range240.5
Interquartile range (IQR)49

Descriptive statistics

Standard deviation38.85594816
Coefficient of variation (CV)0.3861421898
Kurtosis0.8891961858
Mean100.6260108
Median Absolute Deviation (MAD)24.5
Skewness0.6094736593
Sum74664.5
Variance1509.784707
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
87.5121.6%
 
81111.5%
 
140111.5%
 
84.5101.3%
 
107.5101.3%
 
85101.3%
 
107101.3%
 
12091.2%
 
8791.2%
 
70.581.1%
 
10981.1%
 
154.581.1%
 
80.570.9%
 
12170.9%
 
85.570.9%
 
62.570.9%
 
77.570.9%
 
114.570.9%
 
76.570.9%
 
9570.9%
 
6170.9%
 
10070.9%
 
27.560.8%
 
5460.8%
 
94.560.8%
 
Other values (200)53872.5%
 
ValueCountFrequency (%) 
13.520.3%
 
15.510.1%
 
2010.1%
 
20.520.3%
 
21.540.5%
 
2520.3%
 
26.530.4%
 
27.560.8%
 
29.510.1%
 
31.530.4%
 
ValueCountFrequency (%) 
25430.4%
 
237.510.1%
 
232.510.1%
 
22520.3%
 
221.510.1%
 
20530.4%
 
194.520.3%
 
19420.3%
 
184.520.3%
 
18130.4%
 

company_txt
Categorical

HIGH CARDINALITY

Distinct count343
Unique (%)46.2%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
Reynolds American
 
14
MassMutual
 
14
Takeda Pharmaceuticals
 
14
Software Engineering Institute
 
11
Liberty Mutual Insurance
 
10
Other values (338)
679
ValueCountFrequency (%) 
Reynolds American 141.9%
 
MassMutual 141.9%
 
Takeda Pharmaceuticals 141.9%
 
Software Engineering Institute 111.5%
 
Liberty Mutual Insurance 101.3%
 
PNNL 101.3%
 
AstraZeneca 91.2%
 
MITRE 81.1%
 
Pfizer 70.9%
 
Novartis 70.9%
 
Advanced BioScience Laboratories 70.9%
 
Numeric, LLC 70.9%
 
Fareportal 70.9%
 
Rochester Regional Health 70.9%
 
The Church of Jesus Christ of Latter-day Saints 60.8%
 
Tapjoy 60.8%
 
Q2 Solutions 60.8%
 
Novetta 60.8%
 
Beebe Healthcare 60.8%
 
Kronos Bio60.8%
 
Esri 60.8%
 
Rubius Therapeutics 50.7%
 
Sunovion 50.7%
 
The Hanover Insurance Group 50.7%
 
Autodesk 40.5%
 
Other values (318)54974.0%
 

Length

Max length52
Median length14
Mean length16.22506739
Min length3

Overview of Unicode Properties

Unique unicode characters71
Unique unicode categories (?)8
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e10198.5%
 
a8917.4%
 
7676.4%
 
7316.1%
 
n6885.7%
 
t6855.7%
 
i6775.6%
 
r6575.5%
 
o6315.2%
 
s5965.0%
 
c4083.4%
 
l4083.4%
 
u3402.8%
 
h2562.1%
 
C1921.6%
 
d1811.5%
 
S1781.5%
 
m1671.4%
 
g1521.3%
 
T1511.3%
 
A1481.2%
 
p1391.2%
 
y1381.1%
 
I1351.1%
 
L1221.0%
 
Other values (46)158213.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter843370.0%
 
Uppercase Letter195116.2%
 
Space Separator7676.4%
 
Control7316.1%
 
Other Punctuation810.7%
 
Decimal Number540.4%
 
Dash Punctuation180.1%
 
Math Symbol4< 0.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
C1929.8%
 
S1789.1%
 
T1517.7%
 
A1487.6%
 
I1356.9%
 
L1226.3%
 
M1176.0%
 
P1115.7%
 
R1095.6%
 
E944.8%
 
N874.5%
 
H804.1%
 
B773.9%
 
G653.3%
 
F542.8%
 
D361.8%
 
O321.6%
 
V291.5%
 
U241.2%
 
K241.2%
 
W211.1%
 
Z211.1%
 
J201.0%
 
Q160.8%
 
Y50.3%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e101912.1%
 
a89110.6%
 
n6888.2%
 
t6858.1%
 
i6778.0%
 
r6577.8%
 
o6317.5%
 
s5967.1%
 
c4084.8%
 
l4084.8%
 
u3404.0%
 
h2563.0%
 
d1812.1%
 
m1672.0%
 
g1521.8%
 
p1391.6%
 
y1381.6%
 
v871.0%
 
f841.0%
 
b760.9%
 
k530.6%
 
w470.6%
 
x200.2%
 
z170.2%
 
j90.1%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
767100.0%
 

Most frequent Control characters

ValueCountFrequency (%) 
731100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
22037.0%
 
01018.5%
 
3713.0%
 
4611.1%
 
159.3%
 
923.7%
 
623.7%
 
711.9%
 
811.9%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.3745.7%
 
,2125.9%
 
&1316.0%
 
'89.9%
 
/22.5%
 

Most frequent Math Symbol characters

ValueCountFrequency (%) 
<250.0%
 
>250.0%
 

Most frequent Dash Punctuation characters

ValueCountFrequency (%) 
-18100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin1038486.3%
 
Common165513.7%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e10199.8%
 
a8918.6%
 
n6886.6%
 
t6856.6%
 
i6776.5%
 
r6576.3%
 
o6316.1%
 
s5965.7%
 
c4083.9%
 
l4083.9%
 
u3403.3%
 
h2562.5%
 
C1921.8%
 
d1811.7%
 
S1781.7%
 
m1671.6%
 
g1521.5%
 
T1511.5%
 
A1481.4%
 
p1391.3%
 
y1381.3%
 
I1351.3%
 
L1221.2%
 
M1171.1%
 
P1111.1%
 
Other values (27)119711.5%
 

Most frequent Common characters

ValueCountFrequency (%) 
76746.3%
 
73144.2%
 
.372.2%
 
,211.3%
 
2201.2%
 
-181.1%
 
&130.8%
 
0100.6%
 
'80.5%
 
370.4%
 
460.4%
 
150.3%
 
/20.1%
 
<20.1%
 
>20.1%
 
920.1%
 
620.1%
 
710.1%
 
810.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII12039100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e10198.5%
 
a8917.4%
 
7676.4%
 
7316.1%
 
n6885.7%
 
t6855.7%
 
i6775.6%
 
r6575.5%
 
o6315.2%
 
s5965.0%
 
c4083.4%
 
l4083.4%
 
u3402.8%
 
h2562.1%
 
C1921.6%
 
d1811.5%
 
S1781.5%
 
m1671.4%
 
g1521.3%
 
T1511.3%
 
A1481.2%
 
p1391.2%
 
y1381.1%
 
I1351.1%
 
L1221.0%
 
Other values (46)158213.1%
 

job_state
Categorical

Distinct count38
Unique (%)5.1%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
CA
151
MA
103
NY
72
VA
 
41
IL
 
40
Other values (33)
335
ValueCountFrequency (%) 
CA15120.4%
 
MA10313.9%
 
NY729.7%
 
VA415.5%
 
IL405.4%
 
MD354.7%
 
PA334.4%
 
TX283.8%
 
WA212.8%
 
NC212.8%
 
NJ172.3%
 
FL162.2%
 
OH141.9%
 
TN131.8%
 
DC111.5%
 
CO111.5%
 
UT101.3%
 
WI101.3%
 
IN101.3%
 
MO91.2%
 
AZ91.2%
 
AL81.1%
 
DE60.8%
 
MI60.8%
 
KY60.8%
 
Other values (13)415.5%
 

Length

Max length12
Median length3
Mean length3.01212938
Min length3

Overview of Unicode Properties

Unique unicode characters31
Unique unicode categories (?)3
Unique unicode scripts (?)2
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
74333.2%
 
A38217.1%
 
C2008.9%
 
M1587.1%
 
N1426.4%
 
Y783.5%
 
I743.3%
 
L693.1%
 
T562.5%
 
D542.4%
 
V411.8%
 
O381.7%
 
P331.5%
 
W311.4%
 
X281.3%
 
J170.8%
 
F160.7%
 
H140.6%
 
E100.4%
 
U100.4%
 
K90.4%
 
Z90.4%
 
G60.3%
 
R50.2%
 
S40.2%
 
Other values (6)80.4%
 

Most occurring categories

ValueCountFrequency (%) 
Uppercase Letter148466.4%
 
Space Separator74333.2%
 
Lowercase Letter80.4%
 

Most frequent Space Separator characters

ValueCountFrequency (%) 
743100.0%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
A38225.7%
 
C20013.5%
 
M15810.6%
 
N1429.6%
 
Y785.3%
 
I745.0%
 
L694.6%
 
T563.8%
 
D543.6%
 
V412.8%
 
O382.6%
 
P332.2%
 
W312.1%
 
X281.9%
 
J171.1%
 
F161.1%
 
H140.9%
 
E100.7%
 
U100.7%
 
K90.6%
 
Z90.6%
 
G60.4%
 
R50.3%
 
S40.3%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
s225.0%
 
e225.0%
 
o112.5%
 
n112.5%
 
g112.5%
 
l112.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin149266.8%
 
Common74333.2%
 

Most frequent Common characters

ValueCountFrequency (%) 
743100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
A38225.6%
 
C20013.4%
 
M15810.6%
 
N1429.5%
 
Y785.2%
 
I745.0%
 
L694.6%
 
T563.8%
 
D543.6%
 
V412.7%
 
O382.5%
 
P332.2%
 
W312.1%
 
X281.9%
 
J171.1%
 
F161.1%
 
H140.9%
 
E100.7%
 
U100.7%
 
K90.6%
 
Z90.6%
 
G60.4%
 
R50.3%
 
S40.3%
 
s20.1%
 
Other values (5)60.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2235100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
74333.2%
 
A38217.1%
 
C2008.9%
 
M1587.1%
 
N1426.4%
 
Y783.5%
 
I743.3%
 
L693.1%
 
T562.5%
 
D542.4%
 
V411.8%
 
O381.7%
 
P331.5%
 
W311.4%
 
X281.3%
 
J170.8%
 
F160.7%
 
H140.6%
 
E100.4%
 
U100.4%
 
K90.4%
 
Z90.4%
 
G60.3%
 
R50.2%
 
S40.2%
 
Other values (6)80.4%
 

same_state
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
1
414
0
328
ValueCountFrequency (%) 
141455.8%
 
032844.2%
 

age
Real number (ℝ)

Distinct count102
Unique (%)13.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.591644204851754
Minimum-1
Maximum276
Zeros0
Zeros (%)0.0%
Memory size5.9 KiB

Quantile statistics

Minimum-1
5-th percentile-1
Q111
median24
Q359
95-th percentile169
Maximum276
Range277
Interquartile range (IQR)48

Descriptive statistics

Standard deviation53.77881512
Coefficient of variation (CV)1.154258795
Kurtosis2.791116976
Mean46.5916442
Median Absolute Deviation (MAD)17
Skewness1.78661821
Sum34571
Variance2892.160956
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-1506.7%
 
10324.3%
 
12314.2%
 
24273.6%
 
14243.2%
 
8212.8%
 
9192.6%
 
62182.4%
 
18182.4%
 
36182.4%
 
13182.4%
 
5162.2%
 
7152.0%
 
169141.9%
 
145141.9%
 
23141.9%
 
239141.9%
 
6131.8%
 
55121.6%
 
21121.6%
 
3121.6%
 
85101.3%
 
20101.3%
 
108101.3%
 
17101.3%
 
Other values (77)29039.1%
 
ValueCountFrequency (%) 
-1506.7%
 
120.3%
 
3121.6%
 
450.7%
 
5162.2%
 
6131.8%
 
7152.0%
 
8212.8%
 
9192.6%
 
10324.3%
 
ValueCountFrequency (%) 
27610.1%
 
239141.9%
 
20810.1%
 
19040.5%
 
17420.3%
 
17170.9%
 
17010.1%
 
169141.9%
 
16850.7%
 
16420.3%
 

python_yn
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
1
392
0
350
ValueCountFrequency (%) 
139252.8%
 
035047.2%
 

R_yn
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0
740
1
 
2
ValueCountFrequency (%) 
074099.7%
 
120.3%
 

spark
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0
575
1
167
ValueCountFrequency (%) 
057577.5%
 
116722.5%
 

aws
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0
566
1
176
ValueCountFrequency (%) 
056676.3%
 
117623.7%
 

excel
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
1
388
0
354
ValueCountFrequency (%) 
138852.3%
 
035447.7%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

Job TitleRatingLocationHeadquartersSizeFoundedType of ownershipIndustrySectorRevenueCompetitorshourlyemployer_providedmin_salarymax_salaryavg_salarycompany_txtjob_statesame_stateagepython_ynR_ynsparkawsexcel
0Data Scientist3.8Albuquerque, NMGoleta, CA501 to 1000 employees1973Company - PrivateAerospace & DefenseAerospace & Defense$50 to $100 million (USD)-100539172.0Tecolote Research\nNM04710001
1Healthcare Data Scientist3.4Linthicum, MDBaltimore, MD10000+ employees1984Other OrganizationHealth Care Services & HospitalsHealth Care$2 to $5 billion (USD)-1006311287.5University of Maryland Medical System\nMD03610000
2Data Scientist4.8Clearwater, FLClearwater, FL501 to 1000 employees2010Company - PrivateSecurity ServicesBusiness Services$100 to $500 million (USD)-100809085.0KnowBe4\nFL11010101
3Data Scientist3.8Richland, WARichland, WA1001 to 5000 employees1965GovernmentEnergyOil, Gas, Energy & Utilities$500 million to $1 billion (USD)Oak Ridge National Laboratory, National Renewable Energy Lab, Los Alamos National Laboratory00569776.5PNNL\nWA15510000
4Data Scientist2.9New York, NYNew York, NY51 to 200 employees1998Company - PrivateAdvertising & MarketingBusiness ServicesUnknown / Non-ApplicableCommerce Signals, Cardlytics, Yodlee0086143114.5Affinity Solutions\nNY12210001
5Data Scientist3.4Dallas, TXDallas, TX201 to 500 employees2000Company - PublicReal EstateReal Estate$1 to $2 billion (USD)Digital Realty, CoreSite, Equinix007111995.0CyrusOne\nTX12010011
6Data Scientist4.1Baltimore, MDBaltimore, MD501 to 1000 employees2008Company - PrivateBanks & Credit UnionsFinanceUnknown / Non-Applicable-100549373.5ClearOne Advantage\nMD11200001
7Data Scientist3.8San Jose, CASeattle, WA201 to 500 employees2005Company - PrivateConsultingBusiness Services$25 to $50 million (USD)-10086142114.0Logic20/20\nCA01510111
8Research Scientist3.3Rochester, NYRochester, NY10000+ employees2014HospitalHealth Care Services & HospitalsHealth Care$500 million to $1 billion (USD)-100388461.0Rochester Regional Health\nNY1600000
9Data Scientist4.6New York, NYNew York, NY51 to 200 employees2009Company - PrivateInternetInformation Technology$100 to $500 million (USD)Clicktripz, SmarterTravel00120160140.0<intent>\nNY11110100

Last rows

Job TitleRatingLocationHeadquartersSizeFoundedType of ownershipIndustrySectorRevenueCompetitorshourlyemployer_providedmin_salarymax_salaryavg_salarycompany_txtjob_statesame_stateagepython_ynR_ynsparkawsexcel
732Machine Learning Engineer (NLP)4.1Palo Alto, CAPalo Alto, CA1 to 50 employees2007Company - PrivateK-12 EducationEducationUnknown / Non-Applicable-10080142111.0CK-12 Foundation\nCA11310011
733Senior Data Analyst3.9San Francisco, CASan Francisco, CA51 to 200 employees2008Company - PublicComputer Hardware & SoftwareInformation TechnologyUnknown / Non-Applicable-10099178138.5Life360\nCA11210000
734Data Science Project Manager3.6Boston, MASpringfield, MA5001 to 10000 employees1851Company - PrivateInsurance CarriersInsurance$10+ billion (USD)-1003710068.5MassMutual\nMA016900001
735Data Engineer3.9San Francisco, CASan Francisco, CA201 to 500 employees2011Company - PrivateInternetInformation Technology$100 to $500 million (USD)Belly, SpotOn006211387.5Fivestars\nCA1910011
736Principal, Data Science - Advanced Analytics3.6Plymouth Meeting, PADurham, NC10000+ employees2017Company - PublicBiotech & PharmaceuticalsBiotech & Pharmaceuticals$2 to $5 billion (USD)PPD, INC Research, PRA Health Sciences0086137111.5IQVIA\nPA0300000
737Sr Scientist, Immuno-Oncology - Oncology3.9Cambridge, MABrentford, United Kingdom10000+ employees1830Company - PublicBiotech & PharmaceuticalsBiotech & Pharmaceuticals$10+ billion (USD)Pfizer, AstraZeneca, Merck005811184.5GSK\nMA019000010
738Senior Data Engineer4.4Nashville, TNSan Francisco, CA1001 to 5000 employees2006Company - PublicInternetInformation Technology$100 to $500 million (USD)See Tickets, TicketWeb, Vendini0072133102.5Eventbrite\nTN01410110
739Project Scientist - Auton Lab, Robotics Institute2.6Pittsburgh, PAPittsburgh, PA501 to 1000 employees1984College / UniversityColleges & UniversitiesEducationUnknown / Non-Applicable-100569173.5Software Engineering Institute\nPA13600001
740Data Science Manager3.2Allentown, PAChadds Ford, PA1 to 50 employees-1Company - PrivateStaffing & OutsourcingBusiness Services$5 to $10 million (USD)-10095160127.5Numeric, LLC\nPA0-100001
741Research Scientist – Security and Privacy3.6Beavercreek, OHArlington, VA501 to 1000 employees1967Nonprofit OrganizationFederal AgenciesGovernment$50 to $100 million (USD)-1006112693.5Riverside Research Institute\nOH05310000

Duplicate rows

Most frequent

Job TitleRatingLocationHeadquartersSizeFoundedType of ownershipIndustrySectorRevenueCompetitorshourlyemployer_providedmin_salarymax_salaryavg_salarycompany_txtjob_statesame_stateagepython_ynR_ynsparkawsexcelcount
2Analytics Manager - Data Mart3.5Scotts Valley, CAScotts Valley, CA501 to 1000 employees1996Nonprofit OrganizationHealth Care Services & HospitalsHealth Care$500 million to $1 billion (USD)-100428664.0Central California Alliance for Health\nCA124000004
95Food Scientist - Developer3.3Milwaukee, WIMilwaukee, WI501 to 1000 employees1964Company - PrivateFood & Beverage ManufacturingManufacturingUnknown / Non-Applicable-100406854.0Palermo's Pizza\nWI156000004
108MED TECH/LAB SCIENTIST- SOUTH COASTAL LAB3.6Millville, DELewes, DE1001 to 5000 employees1935Nonprofit OrganizationHealth Care Services & HospitalsHealth Care$100 to $500 million (USD)-110213427.5Beebe Healthcare\nDE085000004
115Marketing Data Analyst3.6Highland, CAHighland, CA1001 to 5000 employees1986Company - PrivateGamblingArts, Entertainment & Recreation$100 to $500 million (USD)-100356248.5San Manuel Casino\nCA134000014
120Medical Laboratory Scientist4.0Burleson, TXArlington, TX1001 to 5000 employees1977HospitalHealth Care Services & HospitalsHealth Care$50 to $100 million (USD)-110182521.5Texas Health Huguley Hospital\nTX043000104
136R&D Specialist/ Food Scientist2.4Hoopeston, ILFlower Mound, TX501 to 1000 employees-1Company - PrivateFood & Beverage ManufacturingManufacturing$100 to $500 million (USD)-100396652.5Teasdale Latin Foods\nIL0-1000004
188Senior Research Scientist-Machine Learning2.6Pittsburgh, PAPittsburgh, PA501 to 1000 employees1984College / UniversityColleges & UniversitiesEducationUnknown / Non-Applicable-10081167124.0Software Engineering Institute\nPA136000004
205Sr. Data Engineer - Contract-to-Hire (Java)3.0Knoxville, TNKnoxville, TN10000+ employees1958Company - PrivateGas StationsRetail$10+ billion (USD)TravelCenters of America, Love's Travel Stops & Country Stores, Wawa006912798.0Pilot Flying J Travel Centers LLC\nTN162100004
217Staff Scientist-Downstream Process Development2.7Rockville, MDRockville, MD201 to 500 employees1961Company - PrivateBiotech & PharmaceuticalsBiotech & Pharmaceuticals$25 to $50 million (USD)-1004911381.0Advanced BioScience Laboratories\nMD159000014
3Associate Data Analyst- Graduate Development Program3.3Richfield, OHRichfield, OH501 to 1000 employees1989Company - PrivateInsurance CarriersInsurance$500 million to $1 billion (USD)-100325945.5National Interstate\nOH131000003